Item response theory modeling of divergent thinking fluency scores in a Bayesian regression framework

Many opportunities, and a few challenges

Nils Myszkowski

Department of Psychology, Pace University

March 13, 2025

Modeling fluency scores

Fluency scores

  • In divergent thinking tests, fluency scores are the count of unique responses provided by a respondent to an item.

  • Traditionnally, analyses of fluency scores rely on classical test theory (e.g., sum scores, \(\alpha\), traditionnal factor analysis).

  • All classical test theory models assume:

    • Normally distributed items
    • Linear relation between the trait and the item
    • Constant error variance at all levels of a trait

Non-normal distributions

Non-linear item responses and heteroscedasticity

The 2-parameter Poisson counts model (2PPCM)

2PPCM (Myszkowski & Storme, 2021): the fluency score is drawn from a Poisson distribution: from a Poisson distribution…

\[X_{ij} \sim \text{Poisson}(e^{a_j\theta_i + b_j})\] …with the rate/expectation (and variance) given by:

  • \(\theta_i\): person \(i\)’s latent fluency (\(\theta_i \sim N(0,1)\))
  • \(a_j\): the slope/discrimination/loading of item \(j\)
  • \(b_j\): the easiness of item \(j\)

Model

Estimation

Maximum likelihood approaches:

  • Generalized SEM softare (e.g., Mplus, Stata)
    • Commercial, closed-source packages,
  • Dedicated R packages (e.g., countirt)
    • Risk of “abandonware”
  • Generalized mixed effects models (GLMM) software (e.g., lme4)
    • Does not accomodate variable discrimination parameters, not very flexible

Can a general purpose Bayesian estimation framework do better?

Our questions

  • Is it feasible to estimate log-linear count IRT models in a Bayesian framework with packages non-dedicated to count IRT?

  • Can we obtain results similar to maximum likelihood estimates?

  • Are there benefits to this approach? Are they easily attainable and useful?

  • What are the (current) limitations?

Markov Chain Monte Carlo estimation (in a nutshell)

Inputs

  • Model
  • Dataset
  • More or less vague plausible probabilty distributions for all parameters of the model (prior distributions)

Process

  • Sample from the model parameters’ probability distributions
  • Generate many plausible parameter values based on data & priors
  • Favor values that best fit the data

Output

  • Updated probability distributions for all parameters (posterior distributions)

Stan and brms

Stan (Carpenter et al., 2017): A programming language suited for Bayesian using Hamiltonian Monte Carlo (HMC) estimation.

  • Fast estimation, good convergence, flexibility of prior distributions and models

brms (Bürkner, 2017): An R package to estimate various models in Stan using regression-like syntax (e.g., y ~ x1 + x2)

  • Has been showed to accomodate logistic item-response models (Bürkner, 2020), more convenient than the Stan syntax

How convenient?

It’s not too bad ! See our paper (Myszkowski & Storme, 2025)

…but here’s a quick look !

Does it work?

Example dataset

  • Publicly available dataset for special issue (Forthmann et al., 2019)

  • 202 respondents (variable Person)

  • 3 alternate uses tasks (rope, paperclip, garbage bag) (variable Item)

Specifying the model

formula_2PPCM <- bf(
  Score ~ 0 + slope * theta + easiness, #linear part of the item response model
  theta ~ 0 + (1 | Person),             #Theta is a random effect of the person
  slope ~ 0 + Item,                     #Slope is a fixed effect of the item
  easiness ~ 0 + Item,                  #Easiness is a fixed effect of the item
  nl = TRUE,                            #Activate <Weird Model Mode>
  family = poisson(link = "log")        #Log-linear Poisson model
)

TwoPPCM person Person (random effect) theta theta person->theta item Item (fixed effect) easiness easiness item->easiness slope slope item->slope score Log(Fluency) theta->score easiness->score slope->score

Estimation

fit_2PPCM <- brm(
  formula = formula_2PPCM,                #Passing the model formula
  data = data_long,                       #Passing the dataset
  prior = prior("constant(1)", 
                class = "sd", 
                group = "Person", 
                nlpar = "theta"),         #Identify model (variance standardization)
  iter = 2000, warmup = 500, chains = 4   #Technical options
  )

Results comparable with maximum likelihood

With non/weakly informative priors

Results comparable with maximum likelihood

With non/weakly informative priors

  • Factor scores

  • SE / Posterior uncertainty

We can do do all things (count) IRT !

  • Factor scores (point estimate, error, CI)
  • Item parameters (point estimate, error, CI)
  • Item response functions
  • Item and test information functions
  • Sample/person level reliability
  • Handling of missing data
  • Covariate-adjusted frequency plots
  • Calculate dispersion overall and by item

What are the advantages?

A more natural handling of uncertainty

Full posterior distributions of all parameters.

Probabilistic conclusions about items

What is the probability that finding alternate uses of a rope is easier than of a garbage bag?

Probabilistic conclusions about persons

What is the probability that person 1’s fluency is more than 1 standard deviation higher than person 2’s fluency?

Hierarchical structure for item parameters

Treat item characteristics (e.g. slope) as random deviations from a shared distribution.

formula_2PPCM_ri <- bf(
  Score ~ 0 + slope * theta + easiness,
  theta ~ 0 + (1 | Person),
  slope ~ 0 + (1 | Item),
  easiness ~ 0 + (1 | Item),
  nl = TRUE,                         
  family = poisson(link = "log")
)

Extensibility to explanatory models

Item covariates (e.g., object type):

formula_2PPCM_expl <- bf(
  Score ~ 0 + slope * theta + easiness,
  theta ~ 0 + (1 | Person),
  slope ~ 0 + Item,
  easiness ~ 0 + object_type,
  nl = TRUE,                         
  family = poisson(link = "log")
)

Person covariates (i.e. latent regression/latent mean differences)

formula_2PPCM_expl <- bf(
  Score ~ 0 + slope * theta + easiness,
  theta ~ 0 + training + (1 | Person),
  slope ~ 0 + Item,
  easiness ~ 0 + Item,
  nl = TRUE,                         
  family = poisson(link = "log")
)

Bayesian regularization

  • By using informative priors, we can avoid unrealistic parameter values and unstable models.

  • Probably useful for extensions (e.g., avoiding differential item functionning false positives).

  • Particularly useful in small datasets that we don’t want to trust “maximally”.

Bayesian stacking

We can combine multiple models and/or multiple sets of priors using Bayesian stacking

  • For example, we can obtain \(\theta\) posterior distributions from different models, which we average, weighted by their to their fit.

  • Avoids reliance on a single model/set of priors, leading to more robust predictions.

What are the outstanding issues?

Convergence issues

In this dataset, default priors were sufficient for the RPCM, but not for the 2PPCM.

  • Check the paper for more info. It’s also refined in my poster (on OSF repository).
  • But it’s a still extra steps to implement and justify.

Some next steps

  • Replicate in other divergent thinking datasets
  • Replicate for other count responses (e.g., counts of errors, verbal fluency)
  • Estimate item-dependent dispersion parameter models (Forthmann et al., 2019)
  • Estimate multidimensional models
  • Test measurement invariance with this framework
  • Model tests with both count responses and non-count responses

Thank you !

Find this presentation at https://osf.io/9f4eu/.

References

Bürkner, P.-C. (2017). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80(1), 1–28. https://doi.org/10.18637/jss.v080.i01
Bürkner, P.-C. (2020). Analysing standard progressive matrices (SPM–LS) with Bayesian item response models. Journal of Intelligence, 8(1), 5. https://doi.org/10.3390/jintelligence8010005
Carpenter, B., Gelman, A., Hoffman, M. D., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M., Guo, J., Li, P., & Riddell, A. (2017). Stan: A probabilistic programming language. Journal of Statistical Software, 76, 1–32. https://doi.org/10.18637/jss.v076.i01
Forthmann, B., Gühne, D., & Doebler, P. (2019). Revisiting dispersion in count data item response theory models: The ConwayMaxwellPoisson counts model. British Journal of Mathematical and Statistical Psychology. https://doi.org/10.1111/bmsp.12184
Myszkowski, N., & Storme, M. (2021). Accounting for variable task discrimination in divergent thinking fluency measurement: An example of the benefits of a 2-parameter Poisson counts model and its bifactor extension over the Rasch Poisson counts model. The Journal of Creative Behavior, 55(3), 800–818. https://doi.org/10.1002/jocb.490
Myszkowski, N., & Storme, M. (2025). Bayesian Estimation of Generalized Log-Linear Poisson Item Response Models for Fluency Scores Using brms and Stan. Journal of Intelligence, 13(3), 26. https://doi.org/10.3390/jintelligence13030026